Big Learning with Little RAM

نویسندگان

D. Sculley

Daniel Golovin

چکیده

In large-scale machine learning, available memory (RAM) is often a key constraint, both during model training and when making new predictions. In this paper, we reduce memory cost by projecting our weight vector β ∈ R onto a coarse discrete set using randomized rounding. Because the values of the discrete set can be stored more compactly than standard 32-bit float encodings, this reduces RAM usage by 50% during training and by up 90% at prediction time. Theoretical analysis provides safety guarantees that bound the regret added by this projection. Empirical evaluation confirms excellent results in practice, adding only an additional 0.01% to logistic loss in testing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Gamma Operator for Big Data Summarization on an Array DBMS

SciDB is a parallel array DBMS that provides multidimensional arrays, a query language and basic ACID properties. In this paper, we introduce a summarization matrix operator that computes sufficient statistics in one pass and in parallel on an array DBMS. Such sufficient statistics benefit a big family of statistical and machine learning models, including PCA, linear regression and variable sel...

متن کامل

Large-Scale Learning with Less RAM via Randomization

We reduce the memory footprint of popular large-scale online learning methods by projecting our weight vector onto a coarse discrete set using randomized rounding. Compared to standard 32-bit float encodings, this reduces RAM usage by more than 50% during training and by up to 95% when making predictions from a fixed model, with almost no loss in accuracy. We also show that randomized counting ...

متن کامل

On the Interrelationships among Undergraduate English Foreign Language Learners’ Speaking Ability, Personality Traits, and Learning Styles

The vital role individual differences, such as personality variation, play has long been discussed as the origin of different learning abilities. Accordingly, a cross-sectional survey and a descriptive study was conducted. Data was gathered from a sample of 150 students of both genders (107 females and 43 males) with an age range of 19-22. The translated and validated versions of the Big Five p...

متن کامل

Cultural Components and Subcomponents in Two Persian and English Language Teaching Textbooks: A Comparative Study

The present qualitative research, for the first time, aimed at comparing and contrasting the extent cultural components and subcomponents are represented in the elementary levels of A Course in General Persian and Top Notch Series as foreign language teaching textbooks. The adapted checklist of Lee's Big ‘C' and little ‘c' cultural components (2009) was used for the current study. After conten...

متن کامل

Using Big Data for Predicting Freshmen Retention

Traditional research in student retention is survey-based, relying on data collected from questionnaires, which is not optimal for proactive prediction and real-time decision (student intervention) support. Machine learning approaches have their own limitations. Therefore, in this research, we propose a big data approach to formulating a predictive model. We used commonly available (student dem...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Big Learning with Little RAM

نویسندگان

چکیده

منابع مشابه

The Gamma Operator for Big Data Summarization on an Array DBMS

Large-Scale Learning with Less RAM via Randomization

On the Interrelationships among Undergraduate English Foreign Language Learners’ Speaking Ability, Personality Traits, and Learning Styles

Cultural Components and Subcomponents in Two Persian and English Language Teaching Textbooks: A Comparative Study

Using Big Data for Predicting Freshmen Retention

عنوان ژورنال:

اشتراک گذاری